Machine Learning Advancements and Usage in Neuropsychological Testing and Scoring

National Association of Psychometrists 2025 Annual Meeting

Quinton Quagliano, M.S., C.S.P.

2025-06-13

1 Introduction

1.1 Follow Along

1.2 Learning Objectives

  • Describe the general goals of data analysis and statistics in both research and applied settings

  • Understand the core concepts and vocabulary used in machine learning/AI techniques

  • Compare and contrast the use-cases for “classic” statistical models versus those better addressed by machine learning

  • Survey the current state of research with these methods, specifically related to the context of neuropsychological assessment

  • Critically evaluate the practicality of using these methods in clinics, as well as the procedural, ethical, and statistical problems associated with the methods

1.3 Motivation

  • Born out of inspiration and frustration; neuropsychology has been credibly accused of “falling behind” (Miller & Barr, 2017; Singh & Germine, 2021)

  • Public and corporate pressure to adopt new and flashy technology, many already see adoption of AI by providers as a goal (Hou et al., 2024)

  • Omnipresence of AI- and machine learning-based tools, and proliferation of AI-focused academic research, though not always really meshing with non-AI focused research (Duede et al., 2024)

  • Reflecting on my own role as both educator and consumer

  • Sharing my perspective of a cautious/skeptical optimist

1.4 Purpose

  • Today, I will primarily focus on methods described as machine learning and “AI” and use cases in neuropsychological evaluation, specifically towards prediction of patient cognition from psychological test scores.

  • Review some prominent and recent findings from this booming area of research

  • New benefits with these new techniques (which is exciting), but new challenges as well…

  • Disclaimer: Some cited articles are pre-prints and have not yet undergone peer-review, please review cited studies before using for decision-making

1.5 Why Use Data Analysis and Statistics in Neuropsych Assessment?

  • Evidence-based assessment

  • Build rapport, trust, and education in patients

  • Improve validity and reliability of tests and results

  • Public and academic perception of neuropsychology as valuable patient service

  • Refinement of accuracy of diagnosis in assessment

2 Old Statistics, New Statistics

…statistical methods have a long-standing focus on inference, which is achieved through the creation and fitting of a project-specific probability model… By contrast, ML [machine learning] concentrates on prediction by using general-purpose learning algorithms to find patterns in often rich and unwieldy data” (Bzdok et al., 2018)

2.1 Review of Inferential and Frequentist Statistics

  • Techniques such as z-test, t-test, ANOVA, Pearson’s R, OLS linear regression and many others

  • Result in p-values, beta coefficients, effect sizes, etc.

  • Value comes from estimating probabilities and magnitude of outcomes, resulting from different variables

  • More explainable, tangible results

  • Hypothetical Example

    • Linear regression predicting score on MoCA from age, education, gender (see coefficient table below)
Variable Beta P-value
Age -0.39 0.04
Education 1.94 0.001
Gender (Male Dummy Code) 0.03 0.39

2.2 Inferential Stats in the Wild

  • Evidence of (convergent/concurrent) validity: correlation of FSIQ between WAIS-IV and WAIS-5 - r = 0.92, statistically significant (Wechsler, 2025, p. 85)

  • Quantifying average differences between different types of individuals: WMS-IV LM I comparison between MCI (M = 8.4, SD = 2.7) and matched control group (M = 11.4, SD = 3.0) – t = 4.93, p < 0.01. (Wechsler, 2009, p. 113)

  • Developing appropriate normative data to compare our patients against and establish relative cognitive ability - examples from (Mitrushina et al., 2005, p. 649):

    • Rooted in meta-analytical techniques
    • Predicting trailmaking test score from age -> 26.50094 - 0.2665049age + 0.0069935age (+ additional education correction)

2.3 Overview of Machine Learning (ML)

  • Involves some type of computational “learning” -> model iterates through available data and “trains” in some manner

  • Has existed for a long time, but has scaled up greatly with exponential rise in technology

  • Central goal of improving how accurate we can predict an outcome of interest

    • Prediction of a continuous outcome -> regression
    • Prediction of a categorial outcome -> classification
    • Instead of focus on p-values; more focus on Mean Squared Error (MSE), R-squared, etc.
  • De-prioritizes giving exact, interpretable values to variables

  • Is especially well suited to dealing with MANY different variables at one time

  • Numerous techniques in this broad family - e.g. Classification and Regression Trees (CART), Random Forest (RF), Gradient-boosted Models (GBM), etc.

2.4 Example: CART Model

  • Example - CART Model (Lavery et al., 2007)

  • Screener test (MMSE) score used as a predictor, alongside other cognitive tests, with dementia as outcome

  • MMSE scores are given binary splits (by the model) that produce distinct classification odds of dementia

2.5 What About “AI”?

  • Artificial intelligence, as we know it today, is largely a marketing term encompassing extremely powerful and computationally expensive ML models (Toh et al., 2019)

  • There are some literature reviews that have tried to distinguish the terms (Kühl et al., 2022); but underlying mathematical methods are still the same as ML.

  • Still focused on stellar prediction via especially large datasets and high computation power

  • “Chat” services and note-writing assistance tools are using a framework called a “large language models” (LLMs) and are already finding use in medical settings (Thirunavukarasu et al., 2023).

  • LLMs involves predicting appropriate human-like language (outcome) in response to prompts, context, and training data (predictors).

2.6 Classic Stats vs. ML/AI

Criteria Classic Inferential Models ML/AI Models
Number of “trials” Fits one set of parameters for predictor/grouping variables (static) Uses multiple iterations to improve parameters by some strategy or algorithm (dynamic)
Primary focus or goal Inferring likelihood and magnitude of variable effects on an outcome Accurate prediction of outcome, reducing amount of error
Explain-ability of results Reasonably able to interpret variables and the size of their contributions to the outcome Depending on the model, can be relatively difficult to understand how individual variables predict outcome
Difficulty of computation Low technology demand, mostly calculatable by hand Depending on the model, extremely taxing, large data and technological demand
Number of variables Less variables possible for certain tests (e.g., t-test), even multiple regression models restrict total number Many more predictor variables, predictive power often scales with additional relevant data

3 ML/AI in Neuropsychological Research

3.1 Scoring Tests with Trained Models

3.2 Behavior and Cognition

  • Eye-tracking on ROCFT
  • ML classified executive/memory impairments
  • (Kim et al., 2024)

3.3 Visuospatial Prediction

  • Intersecting Pentagons test
  • 13,777 drawings
  • Outperforms human scoring
  • Extracts geometric features
  • (Tasaki et al., 2023)

3.4 Diagnostic Classification

  • Meta-analysis: test ability to classify AD, MCI
  • Implications for borderline cases
  • (Battista et al., 2020)

3.5 Summary of Research Applications

  • ML/AI can extract insights from complex test data

  • Especially useful with large, multivariate sets

  • Most common in visuospatial testing so far

  • Can inform test selection and diagnostic decisions

4 Caution in the Clinic

4.1 Procedural Challenges

  • ML and “AI” tools may be costly to develop, run, or subscribe to (in the case of commercial software products) and thus, may not be feasible to run on in-clinic hardware or purchase within budget constraints (Crawford, 2021).

  • Clinicians and psychometrists need effective training in navigating and understanding these models to use and apply them (Hedderich et al., 2021)

  • Depending on the setup, it may be necessary to consider how patient data privacy and confidentiality may be at risk when interacting with a model (Chen & Esmaeilzadeh, 2024)

4.2 Ethical Concerns

  • Though some healthcare professionals may be excited and have positive feelings towards AI (Catalina et al., 2023), others may be more wary and object to its use and the impact in may have on the workforce and staffing.

  • Providers may feel concerned in fully trusting these models in being congruent with their own judgments (Lebovitz et al., 2022), which is especially concerning given the high stakes associated with diagnostic accuracy in neuropsychological evaluations.

  • Patient may feel less trust in total or partial reliance in “AI” tools in medical care, compared to diagnostics only performed by providers (Clements et al., 2022)

4.3 Statistical Pitfalls

  • Unlike inferential statistical models, some ML and AI models deal with the “black box” problem of not explaining the why of how predictions are made - though work is being done to try to resolve this issue (Poon & Sung, 2021)

  • AI may “hallucinate”, i.e., predict results completely incorrect or fundamentally flawed – as it is effectively “guessing” or estimating a correct response (Alkaissi & McFarlane, 2023)

  • ML and AI are reliant upon their training data and are thus biased towards patterns existing in that data. When confronted with a case unlike those in the training data, the model is liable to be biased towards what it already “knows” (Barocas et al., 2023)

5 Future Steps

5.1 Conclusion

  • AI and machine learning is something to be excited about!… with caveats.

  • Where there are many exciting developments in computational techniques for analysis in neuropsychology, there are still many considerations to balance and consider in applying these ideas

  • Return to why we use statistics and quantitative analysis in practice, does use of these new technologies support and enhance these?:

    • Evidence-based assessment
    • Build rapport, trust, and education in patients
    • Improve validity and reliability of tests and results
    • Public and academic perception of neuropsychology as valuable patient service
    • Refinement of accuracy of diagnosis in assessment
  • Psychometrists should be deliberate in building up a sufficient knowledge of these technological advantages to weigh how to implement them in practice

5.2 Further Topics

  • If you’re interested in more topics surrounding the intersection of machine learning and evaluation:
    • Fuermaier & Niesten (2025) – AI used by patients to coach towards successful test feigning in ADHD evaluations, implications toward test security and performance validity
    • Devine et al. (2024) - Using advanced models to attempt to better address cultural deficiencies and bias in tests
    • Halkiopoulos & Gkintoni (2024) - Application of ML/AI to computerized adaptive testing, more personalized assessment
    • Emsley (2023) - When AI fabricates citations and other evidence in support of it’s statements

5.3 Additional Resources

Presentations

  • See later today Advances in Neuropsych Assessment of Suspected TBI by Nicole Newman, PhD, CBIS & Juliette Nadershahi, BA, CSP at 1:00pm EST Today!

Academic Articles

  • Two literature review articles dealing with medicine and AI
    • Jiang et al. (2017) - Pre-covid
    • An et al. (2023) - During/post-covid
  • Contact me at Quinton.Quagliano@trinity-health.org for full reference PDFs of any of the articles I cited today!

5.4 References

Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: Implications in scientific writing. Cureus. https://doi.org/10.7759/cureus.35179
An, Q., Rahman, S., Zhou, J., & Kang, J. J. (2023). A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges. Sensors, 23(9), 4178. https://doi.org/10.3390/s23094178
Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and machine learning: Limitations and opportunities. MIT Press. https://fairmlbook.org/
Battista, P., Salvatore, C., Berlingeri, M., Cerasa, A., & Castiglioni, I. (2020). Artificial intelligence and neuropsychological measures: The case of alzheimer’s disease. Neuroscience & Biobehavioral Reviews, 114, 211–228. https://doi.org/10.1016/j.neubiorev.2020.04.026
Catalina, Q. M., Fuster-Casanovas, A., Vidal-Alaball, J., Escalé-Besa, A., Marin-Gomez, F. X., Femenia, J., & Solé-Casals, J. (2023). Knowledge and perception of primary care healthcare professionals on the use of artificial intelligence as a healthcare tool. Digital Health, 9, 20552076231180511. https://doi.org/10.1177/20552076231180511
Chen, Y., & Esmaeilzadeh, P. (2024). Generative AI in medical practice: In-depth exploration of privacy and security challenges. Journal of Medical Internet Research, 26, e53008. https://doi.org/10.2196/53008
Clements, W., Thong, L., Zia, A., Moriarty, H., & Goh, G. (2022). A prospective study assessing patient perception of the use of artificial intelligence in radiology. Asia Pacific Journal of Health Management. https://doi.org/10.24083/apjhm.v17i1.861
Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
Devine, S. A., Lee, M., Gurnani, A. S., Sunderaraman, P., Kourtis, L., Pratap, A., Low, S., Ho, K., Gifford, K. A., & Au, R. (2024). Using new science to reduce bias of old science: AI‐driven customization of culturally biased tests. Alzheimer’s & Dementia, 20, e091267. https://doi.org/10.1002/alz.091267
Duede, E., Dolan, W., Bauer, A., Foster, I., & Lakhani, K. (2024). Oil & water? Diffusion of AI within and across scientific fields. arXiv. https://doi.org/10.48550/ARXIV.2405.15828
Emsley, R. (2023). ChatGPT: These are not hallucinations – they’re fabrications and falsifications. Schizophrenia, 9(1), 52, s41537-023-00379-4. https://doi.org/10.1038/s41537-023-00379-4
Fuermaier, A. B. M., & Niesten, I. J. M. (2025). ChatGPT helps students feign ADHD: An analogue study on AI-assisted coaching. Psychological Injury and Law, 18(2), 97–107. https://doi.org/10.1007/s12207-025-09538-7
Halkiopoulos, C., & Gkintoni, E. (2024). Leveraging AI in e-learning: Personalized learning and adaptive assessment through cognitive neuropsychology—a systematic analysis. Electronics, 13(18), 3762. https://doi.org/10.3390/electronics13183762
Hedderich, D. M., Keicher, M., Wiestler, B., Gruber, M. J., Burwinkel, H., Hinterwimmer, F., Czempiel, T., Spiro, J. E., Pinto Dos Santos, D., Heim, D., Zimmer, C., Rückert, D., Kirschke, J. S., & Navab, N. (2021). AI for doctors—a course to educate medical professionals in artificial intelligence for medical imaging. Healthcare, 9(10), 1278. https://doi.org/10.3390/healthcare9101278
Hou, T., Li, M., Tan, Y. (Ricky)., & Zhao, H. (2024). Physician adoption of AI assistant. Manufacturing & Service Operations Management, 26(5), 1639–1655. https://doi.org/10.1287/msom.2023.0093
Jiang, F., Jiang, Y., Zhi, H., Dong, Y., Li, H., Ma, S., Wang, Y., Dong, Q., Shen, H., & Wang, Y. (2017). Artificial intelligence in healthcare: Past, present and future. Stroke and Vascular Neurology, 2(4), 230–243. https://doi.org/10.1136/svn-2017-000101
Kim, M., Lee, J., Lee, S. Y., Ha, M., Park, I., Jang, J., Jang, M., Park, S., & Kwon, J. S. (2024). Development of an eye-tracking system based on a deep learning model to assess executive function in patients with mental illnesses. Scientific Reports, 14(1), 18186. https://doi.org/10.1038/s41598-024-68586-2
Kühl, N., Schemmer, M., Goutier, M., & Satzger, G. (2022). Artificial intelligence and machine learning. Electronic Markets, 32(4), 2235–2244. https://doi.org/10.1007/s12525-022-00598-0
Langer, N., Weber, M., Vieira, B. H., Strzelczyk, D., Wolf, L., Pedroni, A., Heitz, J., Müller, S., Schultheiss, C., Tröndle, M., Lasprilla, C. A., Rivera, D., Scarpina, F., Zhao, Q., Leuthold, R., Jenni, O. G., Brugger, P., Zaehle, T., Lorenz, R., & Zhang, C. (n.d.). The AI neuropsychologist: Automatic scoring of memory deficits with deep learning.
Lavery, L. L., Lu, S., Chang, C.-C. H., Saxton, J., & Ganguli, M. (2007). Cognitive assessment of older primary care patients with and without memory complaints. Journal of General Internal Medicine, 22(7), 949–954. https://doi.org/10.1007/s11606-007-0198-0
Lebovitz, S., Lifshitz-Assaf, H., & Levina, N. (2022). To engage or not to engage with AI for critical judgments: How professionals deal with opacity when using AI for medical diagnosis. Organization Science, 33(1), 126–148. https://doi.org/10.1287/orsc.2021.1549
Miller, J. B., & Barr, W. B. (2017). The technology crisis in neuropsychology. Archives of Clinical Neuropsychology, 32(5), 541–554. https://doi.org/10.1093/arclin/acx050
Mitrushina, M., Boone, K., Razani, J., & D’Elia, L. (2005). Handbook of normative data for neuropsychological assessment (2nd ed). Oxford University Press.
Park, J. Y., Seo, E. H., Yoon, H.-J., Won, S., & Lee, K. H. (2023). Automating rey complex figure test scoring using a deep learning-based approach: A potential large-scale screening tool for cognitive decline. Alzheimer’s Research & Therapy, 15(1), 145. https://doi.org/10.1186/s13195-023-01283-w
Poon, A. I. F., & Sung, J. J. Y. (2021). Opening the black box of AI‐medicine. Journal of Gastroenterology and Hepatology, 36(3), 581–584. https://doi.org/10.1111/jgh.15384
Singh, S., & Germine, L. (2021). Technology meets tradition: A hybrid model for implementing digital tools in neuropsychology. International Review of Psychiatry, 33(4), 382–393. https://doi.org/10.1080/09540261.2020.1835839
Tasaki, S., Kim, N., Truty, T., Zhang, A., Buchman, A. S., Lamar, M., & Bennett, D. A. (2023). Explainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons. Npj Digital Medicine, 6(1), 157. https://doi.org/10.1038/s41746-023-00904-w
Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930–1940. https://doi.org/10.1038/s41591-023-02448-8
Toh, T. S., Dondelinger, F., & Wang, D. (2019). Looking beyond the hype: Applied AI and machine learning in translational medicine. EBioMedicine, 47, 607–615. https://doi.org/10.1016/j.ebiom.2019.08.027
Wechsler, D. (2009). WMS-IV technical & interperative manual. Pearson Assessments.
Wechsler, D. (2025). WAIS-5 technical & intepretative manual. Pearson Assessments.